-
Couldn't load subscription status.
- Fork 928
Allow name query for MPI_DATATYPE_NULL. #10029
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle, I don't have a problem with this PR. But is it mandated by the MPI spec, or is this an extension?
In the description for MPI_TYPE_GET_NAME in MPI-4.0 (p385 -- or anywhere in section 7.8, "Naming Object" starting on p381), it doesn't explicitly mention that you can pass invalid handles (like MPI_DATATYPE_NULL) to the MPI_*_GET_NAME functions. Indeed, MPI-4.0 p383:33-36 somewhat kinda sorta implies that you can't:
Thus, the names of MPI_COMM_WORLD, MPI_COMM_SELF, and the communicator returned by MPI_COMM_GET_PARENT (if not MPI_COMM_NULL) will have the default of "MPI_COMM_WORLD", "MPI_COMM_SELF", and "MPI_COMM_PARENT".
That being said, if this issue is ambiguous in the MPI standard text, I'm in favor of this functionality because it's one less check that the application program needs to make before calling MPI_*_GET_NAME.
What do we do for the other MPI_*_GET_NAME functions?
Do we know what MPICH does here?
|
The MPI standard define the name for all predefined datatypes, and MPI_DATATYPE_NULL is part of that group. Thus, I see no reason for the current in |
|
I guess I'm a little confused. Can you cite where in MPI-4.0 it says that invalid handles such as MPI_DATATYPE_NULL should have a name returned from MPI_TYPE_GET_NAME? The only description I see under MPI_TYPE_GET_NAME (MPI-4.0 p385:23-24) says:
It doesn't say that invalid handles are accepted by MPI_TYPE_GET_NAME. What am I missing? |
|
MPI_DATATYPE_NULL is not an invalid datatype, it is a NULL handle, and as such it is a predefined constant. Thus, I believe that the naming rule cited above applies. Forcing a library writer to check for MPI_DATATYPE_NULL before calling |
|
The other get_name functions do not suffer from the same problem. This PR is good to be merged. |
Yes they do. ompi/ompi/mpi/c/comm_get_name.c Line 55 in 580984b
which is ompi/ompi/communicator/communicator.h Lines 463 to 493 in 580984b
which explicitly checks for Additionally, MPI_WIN_GET_NAME invokes ompi/ompi/mpi/c/win_get_name.c Line 47 in 580984b
which is Lines 152 to 164 in 580984b
and explicitly checks for Just to be 100% sure, I tested this with a trivial program: #include <stdio.h>
#include <mpi.h>
int main()
{
int len;
char name[MPI_MAX_OBJECT_NAME];
MPI_Init(NULL, NULL);
#if 0
MPI_Type_get_name(MPI_DATATYPE_NULL, name, &len);
#elif 0
MPI_Comm_get_name(MPI_COMM_NULL, name, &len);
#else
MPI_Win_get_name(MPI_WIN_NULL, name, &len);
#endif
printf("Got name: %s\n", name);
MPI_Finalize();
return 0;
}I ran it with all 3 possibilities and Open MPI -- and MPICH 4.0 -- generate an error for all 3 Given that there is some ambiguity here, I think we should have the same behavior for all of the @bosilca if you fix up all the |
This applies to MPI_DATATYPE_NULL, MPI_COMM_NULL and MPI_WIN_NULL. MPI 4.0 states that "The predefined constant MPI_COMM_NULL is the value used for invalid communicator handles." Hence, we might assume that MPI_COMM_NULL is an invalid communicator, but that is only partially true because in 4.12.4 it is clearly stated that the translation function, MPI_*_C2F and MPI_*_F2C, should treat MPI_COMM_NULL as a valiid communicator. So, the real role of these MPI_*_NULL objects might be to point to valid MPI objects used as bounds, but they represent valid MPI objects that cannot be used for communications. In any case, as predefined handles they have an associated name, and the user _must_ be able to get this name. Signed-off-by: George Bosilca <[email protected]>
39c1030 to
124ca9c
Compare
|
Our implementation of ompi_*_invalid is, well, invalid. There is a lengthy comment in communicator.h on why we're doing the wrong thing (according to MPI 2.0), but we're fine with it because we have implemented shortcuts in the few places where it was needed (basically f2c and c2f functions). Anyway, if we assume that invalid means, improper for communications, then we should be good for now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bosilca and I chatted on the phone about this this morning. Here's the conclusions we came to:
- We both agree that the
MPI_*_GET_NAMEfunctions should accept theMPI_*_NULLhandles. It's just better for users. - From that perspective, we should accept this PR.
- From the portability perspective, we should encourage MPICH to adopt this behavior as well. Because it's better for users.
- For simplicity (assuming this PR is accepted), let's bring it to v5.0. But let's not backport it (i.e., do not bring it to v4.1.x or v4.0.x).
- If the MPI Forum accepts mpi-forum/mpi-issues#544, 🎉 . If not, we can revert this PR.
|
|
||
| /* Setup MPI_WIN_NULL */ | ||
| OBJ_CONSTRUCT(&ompi_mpi_win_null.win, ompi_win_t); | ||
| ompi_mpi_win_null.win.w_flags = OMPI_WIN_INVALID; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this safe to do? Are there other places that rely on this flag being set upon construction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only use I could find is in https://github.com/open-mpi/ompi/blob/master/ompi/win/win.h#L155 where the window is explicitly checked for MPI_WIN_NULL as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All functions check the validity of the window using ompi_win_invalid, a functions that specifically checks for MPI_WIN_NULL (in addition to checking the OMPI_WIN_INVALID flag). No other usages of OMPI_WIN_INVALID exists in the code.
It is in fact the same story with all the _INVALID flags so it would be legitimate to wonder why do we need the _INVALID flag at all, but I don't think we want to have this discussion in the context of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If OMPI_WIN_INVALID is only used in 1) the constructor and 2) in ompi_win_invalid(), and you just removed it out of the constructor, then we should also remove it out of ompi_win_invalid(), too. Otherwise it's a dangling check for something that can never be true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's exactly what I said above (and it is true for all uses of _INVALID). But I don't think it should be part of this PR.
|
I disagree with this PR. I think our current behavior is required by the standard and don't believe we should commit this without clarification from the MPI Forum. I think this is the desired behavior, but I also believe it violates the MPI standard (meaning we shouldn't do it without clarification on the standard). MPI_COMM_GET_NAME (and friends) operate on a communicator. MPI_COMM_NULL is not a communicator; it's the NULL handle. The first sentence of MPI_COMM_GET_NAME is:
Unless we all agree that you can call MPI_COMM_SET_NAME on MPI_COMM_NULL, and I would claim that the definition of MPI_COMM_SET_NAME:
prohibits calling SET_NAME on MPI_COMM_NULL because, again, MPI_COMM_NULL is not a communicator, then we can't have GET_NAME handle the NULL handles and conform to the MPI specification. The bit in MPI_COMM_GET_NAME about predefined communicators is also not ambiguous today. It says:
Note (1) the explicit callout of MPI_COMM_NULL as an exception and (2) that it says predefined communicators, not predefined communicator handles. Again, MPI_COMM_NULL is not a communicator, so this section would not apply. The same argument applies to the invalid/valid flag. MPI_COMM_NULL is not a valid communicator, because it isn't a communicator. At one point, we used to mark objects (comm, win, etc.) invalid once they were freed, because the pointer to them was still "valid", from the sense that nothing prohibited a caller from using that pointer to a MPI_Send and it was likely still valid memory full of a semi-sane looking communicator that wouldn't actually work. George's comment makes me think that code got removed at some point. But I don't think removing the invalid is the right choice here, either. |
|
Rescinded my review based on further MPI standard evidence presented above. |
@gpaulsen Did you look at mpi-forum/mpi-issues#544? |
|
Once rebased this shows little difference with main. I checked with the example above and all NULL objects return the correct name with main, this PR is unnecessary. |
Fixes #10028
Signed-off-by: George Bosilca [email protected]